Probability of detecting disease-associated single nucleotide polymorphisms in case-control genome-wide association studies.
نویسندگان
چکیده
Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected," namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk, that allow for heterogeneity in genetic risk. DP increases with genetic effect size and case-control sample size and decreases with the number of nondisease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. We further calculate the sample size of the initial CCGWAS that is required to minimize the total cost of a research program that also includes follow-up studies to examine the T-selected SNPs. A large initial CCGWAS is desirable if genetic effects are small or if the cost of a follow-up study is large.
منابع مشابه
Association study of two single nucleotide polymorphisms rs10757278 and rs1333049 with atherosclerosis, a case-control study from Iraq
Atherosclerosis is one of the most important coronary artery disease (CAD) caused by lipid accumulation, hypertension, smoking, and many other factors such as environmental and genetic factors. It has been recorded that genetic variations in rs10757278 and rs1333049 are correlated with CAD. In the present study, 100 blood samples were collected (50 CAD patients and 50 appeared to be healthy con...
متن کاملSingle Nucleotide Polymorphisms and Association Studies: A Few Critical Points
Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...
متن کاملAssociation Study of rs1333040 and rs1004638 Polymorphisms in the 9p21 Locus with Coronary Artery Disease in Southwest of Iran
Background: Coronary artery disease (CAD) is a multifactorial and heterogenic disease. Recently, genome-wide association studies have reported that rs1333040 (C/T) and rs1004638 (A/T) single nucleotide polymorphisms (SNPs) in the 9p21 locus have very strong association with CAD. This study aimed to examine these associations in Southwest of Iran. Methods: Blood samples were collected from 200 C...
متن کاملAssociation of two single nucleotide polymorphisms rs10407022 and rs3741664 with the risk of primary ovarian insufficiency in a sample of Iraqi women
Primary ovarian insufficiency (POI) can be a devastating disease impacting women below the age of forty. This involves a major decrease in the amount and quality of oocytes, or ovarian reserve in a woman. The distribution of single-nucleotide polymorphisms, rs10407022 and rs3741664, in Iraqi people and its association with primary ovarian insufficiency is the main objective of this study. The m...
متن کاملDetecting gene-by-smoking interactions in a genome-wide association study of early-onset coronary heart disease using random forests
BACKGROUND Genome-wide association studies are often limited in their ability to attain their full potential due to the sheer volume of information created. We sought to use the random forest algorithm to identify single-nucleotide polymorphisms (SNPs) that may be involved in gene-by-smoking interactions related to the early-onset of coronary heart disease. METHODS Using data from the Framing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Biostatistics
دوره 9 2 شماره
صفحات -
تاریخ انتشار 2008